Some gaps and ambiguities will remain in the representative genome sequence until each letter of DNA has been sequenced approximately 9 times. The working draft has just half of that information, so it contains gaps where sometimes, just by chance, sequence was not obtained for particular regions. Sometimes the chemical properties of some stretches of DNA make particular parts of the genome harder to capture and analyze. There are also many repeated sequences in the human genome that complicate assembling the complete genome sequence accurately. Some repeats are short, some are long; some are present in a million copies, others are repeated only twice. Before the human genome sequence is considered finished, scientists must resolve all ambiguities that can be resolved and, one by one, close all gaps that can be closed with modern sequencing technology. Ultimately there will be no more than one error per 10,000 bases; in other words, the sequence will be 99.99% accurate. The finished human genome sequence is expected by 2003.
National Human Genome Research Institute